\newpage

\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}

\lstset{ %
	language=Matlab,                % the language of the code
	basicstyle=\footnotesize,           % the size of the fonts that are used for the code
	numbers=left,                   % where to put the line-numbers
	numberstyle=\tiny\color{gray},  % the style that is used for the line-numbers
	stepnumber=2,                   % the step between two line-numbers. If it's 1, each line 
	% will be numbered
	numbersep=5pt,                  % how far the line-numbers are from the code
	backgroundcolor=\color{white},      % choose the background color. You must add \usepackage{color}
	showspaces=false,               % show spaces adding particular underscores
	showstringspaces=false,         % underline spaces within strings
	showtabs=false,                 % show tabs within strings adding particular underscores
	frame=single,                   % adds a frame around the code
	rulecolor=\color{black},        % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. commens (green here))
	tabsize=2,                      % sets default tabsize to 2 spaces
	captionpos=b,                   % sets the caption-position to bottom
	breaklines=true,                % sets automatic line breaking
	breakatwhitespace=false,        % sets if automatic breaks should only happen at whitespace
	title=\lstname,                   % show the filename of files included with \lstinputlisting;
	% also try caption instead of title
	keywordstyle=\color{blue},          % keyword style
	commentstyle=\color{dkgreen},       % comment style
	stringstyle=\color{mauve},         % string literal style
	escapeinside={\%*}{*)},            % if you want to add LaTeX within your code
	morekeywords={*,...}               % if you want to add more keywords to the set
}

\section{上手实践}

以上对迁移学习基本方法的介绍还只是停留在算法的阶段。对于初学者来说，不仅需要掌握基础的算法知识，更重要的是，需要在实验中发现问题。本章的目的是为初学者提供一个迁移学习上手实践的介绍。通过一步步编写代码、下载数据，完成迁移学习任务。在本部分，我们以迁移学习中最为流行的图像分类为实验对象，在流行的Office+Caltech10数据集上完成。

迁移学习方法主要包括：传统的非深度迁移、深度网络的finetune、深度网络自适应、以及深度对抗网络的迁移。教程的目的是抛砖引玉，帮助初学者快速入门。由于网络上已有成型的深度网络的finetune、深度网络自适应、以及深度对抗网络的迁移教程，因此我们不再叙述这些方法，只在这里介绍非深度方法的教程。其他三种方法的地址分别是：

\begin{itemize}
	\item 深度网络的finetune：\href{https://github.com/jindongwang/transferlearning/tree/master/code/deep/finetune_AlexNet_ResNet}{用Pytorch对Alexnet和Resnet进行微调}、\href{https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html}{使用PyTorch进行finetune}
	\item 深度网络的自适应：\href{https://github.com/jindongwang/transferlearning/tree/master/code/deep/DDC_DeepCoral}{DDC/DCORAL方法的Pytorch代码}
	\item 深度对抗网络迁移：\href{https://github.com/jindongwang/transferlearning/tree/master/code/deep/DANN(RevGrad)}{DANN方法}
\end{itemize}

更多深度迁移方法的代码，请见\url{https://github.com/jindongwang/transferlearning/tree/master/code/deep}。

%\subsection{非深度迁移}

在众多的非深度迁移学习方法中，我们选择最经典的迁移方法之一、发表于IEEE TNN 2011的TCA(Transfer Component Analysis)~\cite{pan2011domain}方法进行实践。为了便于学习，我们同时用Matlab和Python实现了此代码。代码的链接为\url{https://github.com/jindongwang/transferlearning/tree/master/code/traditional/TCA}。下面我们对代码进行简单讲解。

\subsection{TCA方法代码实现}

\subsubsection{Matlab}

\textbf{1. 数据获取}

由于我们要测试非深度方法，因此，选择SURF特征文件作为算法的输入。SURF特征文件可以从网络上~\footnote{\url{https://pan.baidu.com/s/1bp4g7Av}}下载。下载到的文件主要包含4个.mat文件：Caltech.mat, amazon.mat, webcam.mat, dslr.mat。它们恰巧对应4个不同的领域。彼此之间两两一组，就是一个迁移学习任务。每个数据文件包含两个部分：fts为800维的特征，labels为对应的标注。在测试中，我们选择由Caltech.mat作为源域，由amazon.mat作为目标域。Office+Caltech10数据集的介绍可以在本手册的第~\ref{sec-dataset}部分找到。

我们对数据进行加载并做简单的归一化，将最后的数据存入$X_s,Y_s,X_t,Y_t$这四个变量中。这四个变量分别对应源域的特征和标注、以及目标域的特征和标注。代码如下：

\begin{lstlisting}[title=Matlab加载数据, frame=shadowbox]
load('Caltech.mat');     % source domain
fts = fts ./ repmat(sum(fts,2),1,size(fts,2)); 
Xs = zscore(fts,1);    clear fts
Ys = labels;           clear labels

load('amazon.mat');    % target domain
fts = fts ./ repmat(sum(fts,2),1,size(fts,2)); 
Xt = zscore(fts,1);     clear fts
Yt = labels;            clear labels
\end{lstlisting}

\textbf{2. 算法精炼}

TCA主要进行边缘分布自适应。通过整理化简，TCA最终的求解目标是：
\begin{equation}
\label{equ-eigen}
\begin{split}
\left(\mathbf{X} \mathbf{M} \mathbf{X}^\top + \lambda \mathbf{I}\right) \mathbf{A} =\mathbf{X} \mathbf{H} \mathbf{X}^\top \mathbf{A} \Phi 
\end{split}
\end{equation}

上述表达式可以通过Matlab自带的$\mathrm{eigs()}$函数直接求解。$\mathbf{A}$就是我们要求解的变换矩阵。下面我们需要明确各个变量所指代的含义：

\begin{itemize}
	\item $\mathbf{X}$: 由源域和目标域数据共同构成的数据矩阵
	\item $C$: 总的类别个数。在我们的数据集中，$C=10$
	\item $\mathbf{M}_c$: MMD矩阵。当$c=0$时为全MMD矩阵；当$c>1$时对应为每个类别的矩阵。
	\item $\mathbf{I}$：单位矩阵
	\item $\lambda$：平衡参数，直接给出
	\item $\mathbf{H}$: 中心矩阵，直接计算得出
	\item $\Phi$: 拉格朗日因子，不用理会，求解用不到
\end{itemize}

\textbf{3. 编写代码}

我们直接给出精炼后的源码：

\begin{lstlisting}[title=TCA方法的Matlab实现, frame=shadowbox]
function [X_src_new,X_tar_new,A] = TCA(X_src,X_tar,options)
% The is the implementation of Transfer Component Analysis.
% Reference: Sinno Pan et al. Domain Adaptation via Transfer Component Analysis. TNN 2011.

% Inputs: 
%%% X_src          :    source feature matrix, ns * n_feature
%%% X_tar          :    target feature matrix, nt * n_feature
%%% options        :    option struct
%%%%% lambda       :    regularization parameter
%%%%% dim          :    dimensionality after adaptation (dim <= n_feature)
%%%%% kernel_tpye  :    kernel name, choose from 'primal' | 'linear' | 'rbf'
%%%%% gamma        :    bandwidth for rbf kernel, can be missed for other kernels

% Outputs: 
%%% X_src_new      :    transformed source feature matrix, ns * dim
%%% X_tar_new      :    transformed target feature matrix, nt * dim
%%% A              :    adaptation matrix, (ns + nt) * (ns + nt)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Set options
lambda = options.lambda;              
dim = options.dim;                    
kernel_type = options.kernel_type;    
gamma = options.gamma;                

%% Calculate
X = [X_src',X_tar'];
X = X*diag(sparse(1./sqrt(sum(X.^2))));
[m,n] = size(X);
ns = size(X_src,1);
nt = size(X_tar,1);
e = [1/ns*ones(ns,1);-1/nt*ones(nt,1)];
M = e * e';
M = M / norm(M,'fro');
H = eye(n)-1/(n)*ones(n,n);
if strcmp(kernel_type,'primal')
[A,~] = eigs(X*M*X'+lambda*eye(m),X*H*X',dim,'SM');
Z = A' * X;
Z = Z * diag(sparse(1./sqrt(sum(Z.^2))));
X_src_new = Z(:,1:ns)';
X_tar_new = Z(:,ns+1:end)';
else
K = TCA_kernel(kernel_type,X,[],gamma);
[A,~] = eigs(K*M*K'+lambda*eye(n),K*H*K',dim,'SM');
Z = A' * K;
Z = Z*diag(sparse(1./sqrt(sum(Z.^2))));
X_src_new = Z(:,1:ns)';
X_tar_new = Z(:,ns+1:end)';
end
end

% With Fast Computation of the RBF kernel matrix
% To speed up the computation, we exploit a decomposition of the Euclidean distance (norm)
%
% Inputs:
%       ker:    'linear','rbf','sam'
%       X:      data matrix (features * samples)
%       gamma:  bandwidth of the RBF/SAM kernel
% Output:
%       K: kernel matrix
%
% Gustavo Camps-Valls
% 2006(c)
% Jordi (jordi@uv.es), 2007
% 2007-11: if/then -> switch, and fixed RBF kernel
% Modified by Mingsheng Long
% 2013(c)
% Mingsheng Long (longmingsheng@gmail.com), 2013
function K = TCA_kernel(ker,X,X2,gamma)

switch ker
case 'linear'

if isempty(X2)
K = X'*X;
else
K = X'*X2;
end

case 'rbf'

n1sq = sum(X.^2,1);
n1 = size(X,2);

if isempty(X2)
D = (ones(n1,1)*n1sq)' + ones(n1,1)*n1sq -2*X'*X;
else
n2sq = sum(X2.^2,1);
n2 = size(X2,2);
D = (ones(n2,1)*n1sq)' + ones(n1,1)*n2sq -2*X'*X2;
end
K = exp(-gamma*D); 

case 'sam'

if isempty(X2)
D = X'*X;
else
D = X'*X2;
end
K = exp(-gamma*acos(D).^2);

otherwise
error(['Unsupported kernel ' ker])
end
end

\end{lstlisting}

我们将TCA方法包装成函数$\mathrm{TCA}$。注意到TCA是一个无监督迁移方法，不需要接受label作为参数。因此，函数共接受3个输入参数：

\begin{itemize}
	\item $\mathrm{X_{src}}$: 源域的特征，大小为$n_s \times m$
	\item $\mathrm{X_{tar}}$: 目标域的特征，大小为$n_t \times m$
	\item $\mathrm{options}$: 参数结构体，它包含：
	\begin{itemize}
		\item $\lambda$: 平衡参数，可以自由给出
		\item $dim$: 算法最终选择将数据将到多少维
		\item $kernel type$: 选择的核类型，可以选择RBF、线性、或无核
		\item $\gamma$: 如果选择RBF核，那么它的宽度为$\gamma$
	\end{itemize}
\end{itemize}

函数的输出包含3项：
\begin{itemize}
	\item $X_{srcnew}$: TCA后的源域
	\item $X_{tarnew}$: TCA后的目标域
	\item $A$: 最终的变换矩阵
\end{itemize}

\textbf{4. 测试算法}

我们使用如下的代码对TCA算法进行测试：

\begin{lstlisting}
options.gamma = 2;          % the parameter for kernel
options.kernel_type = 'linear';
options.lambda = 1.0;
options.dim = 20;
[X_src_new,X_tar_new,A] = TCA(Xs,Xt,options);

% Use knn to predict the target label
knn_model = fitcknn(X_src_new,Y_src,'NumNeighbors',1);
Y_tar_pseudo = knn_model.predict(X_tar_new);
acc = length(find(Y_tar_pseudo==Y_tar))/length(Y_tar); 
fprintf('Acc=%0.4f\n',acc);
\end{lstlisting}

结果显示如下：
\begin{lstlisting}
	Acc=0.4499
\end{lstlisting}

\subsubsection{Python}

与Matlab代码类似，我们也可以用Python对TCA进行实现，其主要依赖于Numpy和Scipy两个强大的科学计算库。Python版本的TCA代码如下：

\begin{lstlisting}[title=TCA方法的Python实现, frame=shadowbox]

import numpy as np
import scipy.io
import scipy.linalg
import sklearn.metrics
from sklearn.neighbors import KNeighborsClassifier


def kernel(ker, X1, X2, gamma):
K = None
if not ker or ker == 'primal':
K = X1
elif ker == 'linear':
if X2 is not None:
K = sklearn.metrics.pairwise.linear_kernel(np.asarray(X1).T, np.asarray(X2).T)
else:
K = sklearn.metrics.pairwise.linear_kernel(np.asarray(X1).T)
elif ker == 'rbf':
if X2 is not None:
K = sklearn.metrics.pairwise.rbf_kernel(np.asarray(X1).T, np.asarray(X2).T, gamma)
else:
K = sklearn.metrics.pairwise.rbf_kernel(np.asarray(X1).T, None, gamma)
return K


class TCA:
def __init__(self, kernel_type='primal', dim=30, lamb=1, gamma=1):
'''
Init func
:param kernel_type: kernel, values: 'primal' | 'linear' | 'rbf'
:param dim: dimension after transfer
:param lamb: lambda value in equation
:param gamma: kernel bandwidth for rbf kernel
'''
self.kernel_type = kernel_type
self.dim = dim
self.lamb = lamb
self.gamma = gamma

def fit(self, Xs, Xt):
'''
Transform Xs and Xt
:param Xs: ns * n_feature, source feature
:param Xt: nt * n_feature, target feature
:return: Xs_new and Xt_new after TCA
'''
X = np.hstack((Xs.T, Xt.T))
X /= np.linalg.norm(X, axis=0)
m, n = X.shape
ns, nt = len(Xs), len(Xt)
e = np.vstack((1 / ns * np.ones((ns, 1)), -1 / nt * np.ones((nt, 1))))
M = e * e.T
M = M / np.linalg.norm(M, 'fro')
H = np.eye(n) - 1 / n * np.ones((n, n))
K = kernel(self.kernel_type, X, None, gamma=self.gamma)
n_eye = m if self.kernel_type == 'primal' else n
a, b = np.linalg.multi_dot([K, M, K.T]) + self.lamb * np.eye(n_eye), np.linalg.multi_dot([K, H, K.T])
w, V = scipy.linalg.eig(a, b)
ind = np.argsort(w)
A = V[:, ind[:self.dim]]
Z = np.dot(A.T, K)
Z /= np.linalg.norm(Z, axis=0)
Xs_new, Xt_new = Z[:, :ns].T, Z[:, ns:].T
return Xs_new, Xt_new

def fit_predict(self, Xs, Ys, Xt, Yt):
'''
Transform Xs and Xt, then make predictions on target using 1NN
:param Xs: ns * n_feature, source feature
:param Ys: ns * 1, source label
:param Xt: nt * n_feature, target feature
:param Yt: nt * 1, target label
:return: Accuracy and predicted_labels on the target domain
'''
Xs_new, Xt_new = self.fit(Xs, Xt)
clf = KNeighborsClassifier(n_neighbors=1)
clf.fit(Xs_new, Ys.ravel())
y_pred = clf.predict(Xt_new)
acc = sklearn.metrics.accuracy_score(Yt, y_pred)
return acc, y_pred


if __name__ == '__main__':
domains = ['caltech.mat', 'amazon.mat', 'webcam.mat', 'dslr.mat']
for i in [2]:
for j in [3]:
if i != j:
src, tar = 'data/' + domains[i], 'data/' + domains[j]
src_domain, tar_domain = scipy.io.loadmat(src), scipy.io.loadmat(tar)
Xs, Ys, Xt, Yt = src_domain['feas'], src_domain['label'], tar_domain['feas'], tar_domain['label']
tca = TCA(kernel_type='linear', dim=30, lamb=1, gamma=1)
acc, ypre = tca.fit_predict(Xs, Ys, Xt, Yt)
print(acc)

\end{lstlisting}

\textbf{5. 小结}

通过以上过程，我们分别使用Matlab代码和Python代码对经典的TCA方法进行了实验，完成了一个迁移学习任务。其他的非深度迁移学习方法，均可以参考上面的过程。值得庆幸的是，许多论文的作者都公布了他们的文章代码，以方便我们进行接下来的研究。读者可以从Github~\footnote{\url{https://github.com/jindongwang/transferlearning/tree/master/code}}或者相关作者的网站上获取其他许多方法的代码。

\subsection{深度网络的finetune代码实现}

本小节我们用Pytorch实现一个深度网络的finetune。Pytorch是一个较为流行的深度学习工具包，由Facebook进行开发，在Github~\footnote{\url{https://github.com/pytorch/pytorch}}上也进行了开源。

Finetune指的是训练好的深度网络，拿来在新的目标域上进行微调。因此，我们假定读者具有基本的Pytorch知识，直接给出finetune的代码。完整的代码可以在这里~\footnote{\url{https://github.com/jindongwang/transferlearning/tree/master/code/deep/finetune_AlexNet_ResNet}}找到。

我们定义一个叫做finetune的函数，它接受输入的一个已有模型，从目标数据中进行微调，输出最好的模型其结果。其代码如下：

\begin{lstlisting}[title=深度网络的finetune代码实现, frame=shadowbox]

def finetune(model, dataloaders, optimizer):
since = time.time()
best_acc = 0.0
acc_hist = []
criterion = nn.CrossEntropyLoss()
for epoch in range(1, N_EPOCH + 1):
# lr_schedule(optimizer, epoch)
print('Learning rate: {:.8f}'.format(optimizer.param_groups[0]['lr']))
print('Learning rate: {:.8f}'.format(optimizer.param_groups[-1]['lr']))
for phase in ['src', 'val', 'tar']:
if phase == 'src':
model.train()
else:
model.eval()
total_loss, correct = 0, 0
for inputs, labels in dataloaders[phase]:
inputs, labels = inputs.to(DEVICE), labels.to(DEVICE)
optimizer.zero_grad()
with torch.set_grad_enabled(phase == 'src'):
outputs = model(inputs)
loss = criterion(outputs, labels)
preds = torch.max(outputs, 1)[1]
if phase == 'src':
loss.backward()
optimizer.step()
total_loss += loss.item() * inputs.size(0)
correct += torch.sum(preds == labels.data)
epoch_loss = total_loss / len(dataloaders[phase].dataset)
epoch_acc = correct.double() / len(dataloaders[phase].dataset)
acc_hist.append([epoch_loss, epoch_acc])
print('Epoch: [{:02d}/{:02d}]---{}, loss: {:.6f}, acc: {:.4f}'.format(epoch, N_EPOCH, phase, epoch_loss,
epoch_acc))
if phase == 'tar' and epoch_acc > best_acc:
best_acc = epoch_acc
print()
fname = 'finetune_result' + model_name + \
str(LEARNING_RATE) + str(args.source) + \
'-' + str(args.target) + '.csv'
np.savetxt(fname, np.asarray(a=acc_hist, dtype=float), delimiter=',',
fmt='%.4f')
time_pass = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_pass // 60, time_pass % 60))

return model, best_acc, acc_hist

\end{lstlisting}

其中，model可以是由任意深度网络训练好的模型，如Alexnet、Resnet等。

另外，有很多任务也需要用到深度网络来提取深度特征以便进一步处理。我们也进行了实现，代码在\url{https://github.com/jindongwang/transferlearning/blob/master/code/feature_extractor}中。

%
\subsection{深度网络自适应代码}

我们仍然以Pytorch为例，实现深度网络的自适应。具体地说，实现经典的DDC (Deep Domain Confusion)~\cite{tzeng2014deep}方法和与其类似的DCORAL (Deep CORAL)~\cite{sun2016deep}方法。

此网络实现的核心是：如何正确计算DDC中的MMD损失、以及DCORAL中的CORAL损失，并且与神经网络进行集成。此部分对于初学者难免有一些困惑。如何输入源域和目标域、如何进行判断？因此，我们认为此部分应该是深度迁移学习的基础代码，读者应该努力地进行学习和理解。

\textbf{网络结构}

首先我们要定义好网络的架构，其应该是来自于已有的网络结构，如Alexnet和Resnet。但不同的是，由于要进行深度迁移适配，因此，输出层要和finetune一样，和目标的类别数相同。其二，由于要进行距离的计算，我们需要加一个叫做bottleneck的层，用来将最高维的特征进行降维，然后进行距离计算。当然，bottleneck层不加尚可。

我们的网络结构如下所示：

\begin{lstlisting}[title=深度迁移网络代码实现, frame=shadowbox]
import torch.nn as nn
import torchvision
from Coral import CORAL
import mmd
import backbone


class Transfer_Net(nn.Module):
def __init__(self, num_class, base_net='resnet50', transfer_loss='mmd', use_bottleneck=True, bottleneck_width=256, width=1024):
super(Transfer_Net, self).__init__()
self.base_network = backbone.network_dict[base_net]()
self.use_bottleneck = use_bottleneck
self.transfer_loss = transfer_loss
bottleneck_list = [nn.Linear(self.base_network.output_num(
), bottleneck_width), nn.BatchNorm1d(bottleneck_width), nn.ReLU(), nn.Dropout(0.5)]
self.bottleneck_layer = nn.Sequential(*bottleneck_list)
classifier_layer_list = [nn.Linear(self.base_network.output_num(), width), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(width, num_class)]
self.classifier_layer = nn.Sequential(*classifier_layer_list)

self.bottleneck_layer[0].weight.data.normal_(0, 0.005)
self.bottleneck_layer[0].bias.data.fill_(0.1)
for i in range(2):
self.classifier_layer[i * 3].weight.data.normal_(0, 0.01)
self.classifier_layer[i * 3].bias.data.fill_(0.0)

def forward(self, source, target):
source = self.base_network(source)
target = self.base_network(target)
source_clf = self.classifier_layer(source)
if self.use_bottleneck:
source = self.bottleneck_layer(source)
target = self.bottleneck_layer(target)
transfer_loss = self.adapt_loss(source, target, self.transfer_loss)
return source_clf, transfer_loss

def predict(self, x):
features = self.base_network(x)
clf = self.classifier_layer(features)
return clf

\end{lstlisting}

其中Transfer Net是整个网络的模型定义。它接受参数有：

\begin{itemize}
	\item num class: 目标域类别数
	\item base net: 主干网络，例如Resnet等，也可以是自己定义的网络结构
	\item Transfer loss: 迁移的损失，比如MMD和CORAL，也可以是自己定义的损失
	\item use bottleneck: 是否使用bottleneck
	\item bottleneck width: bottleneck的宽度
	\item width: 分类器层的width
\end{itemize}

\textbf{迁移损失定义}

迁移损失是核心。其定义如下：

\begin{lstlisting}[title=深度迁移网络代码实现, frame=shadowbox]

 def adapt_loss(self, X, Y, adapt_loss):
"""Compute adaptation loss, currently we support mmd and coral
Arguments:
X {tensor} -- source matrix
Y {tensor} -- target matrix
adapt_loss {string} -- loss type, 'mmd' or 'coral'. You can add your own loss
Returns:
[tensor] -- adaptation loss tensor
"""
if adapt_loss == 'mmd':
mmd_loss = mmd.MMD_loss()
loss = mmd_loss(X, Y)
elif adapt_loss == 'coral':
loss = CORAL(X, Y)
else:
loss = 0
return loss
\end{lstlisting}

其中的MMD和CORAL是自己实现的两个loss，MMD对应DDC方法，CORAL对应DCORAL方法。其代码在上述github中可以找到，我们不再赘述。

\textbf{训练}

训练时，我们一次输入一个batch的源域和目标域数据。为了方便，我们使用pytorch自带的dataloader。

\begin{lstlisting}[title=深度迁移网络代码实现, frame=shadowbox]
def train(source_loader, target_train_loader, target_test_loader, model, optimizer, CFG):
len_source_loader = len(source_loader)
len_target_loader = len(target_train_loader)
train_loss_clf = utils.AverageMeter()
train_loss_transfer = utils.AverageMeter()
train_loss_total = utils.AverageMeter()
for e in range(CFG['epoch']):
model.train()
iter_source, iter_target = iter(
source_loader), iter(target_train_loader)
n_batch = min(len_source_loader, len_target_loader)
criterion = torch.nn.CrossEntropyLoss()
for i in range(n_batch):
data_source, label_source = iter_source.next()
data_target, _ = iter_target.next()
data_source, label_source = data_source.to(
DEVICE), label_source.to(DEVICE)
data_target = data_target.to(DEVICE)

optimizer.zero_grad()
label_source_pred, transfer_loss = model(data_source, data_target)
clf_loss = criterion(label_source_pred, label_source)
loss = clf_loss + CFG['lambda'] * transfer_loss
loss.backward()
optimizer.step()
train_loss_clf.update(clf_loss.item())
train_loss_transfer.update(transfer_loss.item())
train_loss_total.update(loss.item())
if i % CFG['log_interval'] == 0:
print('Train Epoch: [{}/{} ({:02d}%)], cls_Loss: {:.6f}, transfer_loss: {:.6f}, total_Loss: {:.6f}'.format(
e + 1,
CFG['epoch'],
int(100. * i / n_batch), train_loss_clf.avg, train_loss_transfer.avg, train_loss_total.avg))

# Test
test(model, target_test_loader)
\end{lstlisting}

%
%\subsection{深度对抗网络迁移}